翻訳と辞書
Words near each other
・ Edisto Island Baptist Church
・ Edisto Island Presbyterian Church
・ Edisto Island, South Carolina
・ Edisto River
・ Edisto Rocks
・ Edistus
・ Edit
・ Edit (album)
・ EdIT (musician)
・ Edit (Regina Spektor song)
・ Edit Balázsovits
・ Edit Bauer
・ Edit Bérces
・ Edit conflict
・ Edit decision list
Edit distance
・ Edit Herczog
・ Edit Klocker
・ Edit Kovács
・ Edit Kovács (swimmer)
・ Edit mask
・ Edit menu
・ Edit Miklós
・ Edit Perényi-Weckinger
・ Edit Schlaffer
・ Edit Soós
・ Edit Vári
・ Edit-a-thon
・ Edita
・ Edita Abdieski


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Edit distance : ウィキペディア英語版
Edit distance
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of macromolecules such as DNA, which can be viewed as strings of the letters A, C, G and T.
Several definitions of edit distance exist, using different sets of string operations. One of the most common variants is called Levenshtein distance, named after the Soviet Russian computer scientist Vladimir Levenshtein. In this version, the allowed operations are the removal or insertion of a single character, or the substitution of one character for another. Levenshtein distance may also simply be called "edit distance", although several variants exist.
==Formal definition and properties==
Given two strings and on an alphabet (e.g. the set of ASCII characters, the set of bytes (), etc.), the edit distance is the minimum-weight series of edit operations that transforms into . One of the simplest sets of edit operations is that defined by Levenshtein in 1966:〔
:Insertion of a single symbol. If = , then inserting the symbol produces . This can also be denoted ε→, using ε to denote the empty string.
:Deletion of a single symbol changes to (→ε).
:Substitution of a single symbol for a symbol ≠ changes to (→).
In Levenshtein's original definition, each of these operations has unit cost (except that substitution of a character by itself has zero cost), so the Levenshtein distance is equal to the minimum ''number'' of operations required to transform to . A more general definition associates non-negative weight functions ins(), del() and sub(, ) with the operations.
Additional primitive operations have been suggested. A common mistake when typing text is transposition of two adjacent characters commonly occur, formally characterized by an operation that changes into where .〔
For the task of correcting OCR output, merge and split operations have been used which replace a single character into a pair of them or vice versa.
Other variants of edit distance are obtained by restricting the set of operations. Longest common subsequence (LCS) distance is edit distance with insertion and deletion as the only two edit operations, both at unit cost.〔 Similarly, by only allowing substitutions (again at unit cost), Hamming distance is obtained; this must be restricted to equal-length strings.〔
Jaro–Winkler distance can be obtained from an edit distance where only transpositions are allowed.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Edit distance」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.